{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ARIMA\n",
    "\n",
    "An [AutoRegressive Integrated Moving Average](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) model is a popular model used in time series analysis to understand the data or forecast future points.\n",
    "\n",
    "This implementation can fit a model to each time series in a batch and perform in-sample predictions and out-of-sample forecasts. It is designed to give the best performance when working on a large batch of time series.\n",
    "\n",
    "Useful links:\n",
    "\n",
    "- cuDF documentation: https://docs.rapids.ai/api/cudf/stable\n",
    "- cuML's ARIMA API docs: https://rapidsai.github.io/projects/cuml/en/stable/api.html#arima\n",
    "- a good introduction to ARIMA: https://otexts.com/fpp2/arima.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cudf\n",
    "from cuml.tsa.arima import ARIMA\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading util\n",
    "\n",
    "The data for this demo is stored in a simple CSV format:\n",
    "- the data series are stored in columns\n",
    "- the first column contains the date of the data points\n",
    "- the first row contains the name of each variable\n",
    "\n",
    "For example, let's check the *population estimate* dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat data/time_series/population_estimate.csv | head"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define a helper function to load a dataset with a given name and return a GPU dataframe. We discard the date, and limit the batch size for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_dataset(name, max_batch=4):\n",
    "    import os\n",
    "    pdf = pd.read_csv(os.path.join(\"data\", \"time_series\", \"%s.csv\" % name))\n",
    "    return cudf.from_pandas(pdf[pdf.columns[1:max_batch+1]].astype(np.float64))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualization util\n",
    "\n",
    "We define a helper function that displays the data, and optionally a prediction starting from a given index. Each time series is plot separately for better readability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def visualize(y, pred=None, pred_start=None):\n",
    "    n_obs, batch_size = y.shape\n",
    "\n",
    "    # Create the subplots\n",
    "    c = min(batch_size, 2)\n",
    "    r = (batch_size + c - 1) // c\n",
    "    fig, ax = plt.subplots(r, c, squeeze=False)\n",
    "    ax = ax.flatten()\n",
    "    \n",
    "    # Range for the prediction\n",
    "    if pred is not None:\n",
    "        pred_start = pred_start or n_obs\n",
    "        pred_end = pred_start + pred.shape[0]\n",
    "    \n",
    "    # Plot the data\n",
    "    for i in range(batch_size):\n",
    "        title = y.columns[i]\n",
    "        ax[i].plot(np.r_[:n_obs], y[title].to_array())\n",
    "        if pred is not None:\n",
    "            ax[i].plot(np.r_[pred_start:pred_end], pred[:, i],\n",
    "                             linestyle=\"--\")\n",
    "        ax[i].title.set_text(title)\n",
    "    for i in range(batch_size, r*c):\n",
    "        fig.delaxes(ax[i])\n",
    "    fig.tight_layout()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-seasonal ARIMA models\n",
    "\n",
    "A basic `ARIMA(p,d,q)` model is made of three components:\n",
    " - An **Integrated** (I) component: the series is differenced `d` times until it is stationary\n",
    " - An **AutoRegressive** (AR) component: the variable is regressed on its `p` past values\n",
    " - A **Moving Average** (MA) component: the variable is regressed on `q` past error terms\n",
    "\n",
    "The model can also incorporate an optional constant term (called *intercept*).\n",
    "\n",
    "### A simple MA(2) example\n",
    "\n",
    "We start with a simple Moving Average model. Let's first load and visualize the *migrations in Auckland by age* dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_mig = load_dataset(\"net_migrations_auckland_by_age\", 4)\n",
    "visualize(df_mig)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to fit the model with `q`=2 and with an intercept.\n",
    "The `ARIMA` class accepts cuDF dataframes or array-like types as input (host or device), e.g numpy arrays. Here we already have a dataframe so we can simply pass it to the `ARIMA` constructor with the model parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_mig = ARIMA(df_mig, (0,0,2), fit_intercept=True)\n",
    "model_mig.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now forecast and visualize the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fc_mig = model_mig.forecast(10)\n",
    "visualize(df_mig, fc_mig)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note:* the returned array is a device array. You can convert it to a numpy array with the `copy_to_host` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(type(fc_mig))\n",
    "print(type(fc_mig.copy_to_host()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want to get the parameters that were fitted to the model, we use the `get_params` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "param_mig = model_mig.get_params()\n",
    "print(param_mig.keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The parameters are organized in 2D arrays: one row represents one parameter and the columns are different batch members."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the ma.L1 and ma.L2 parameters for each of 4 batch members\n",
    "print(param_mig[\"ma\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also get the log-likelihood of the parameters w.r.t to the series, and evaluate various information criteria:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"log-likelihood:\\n\", model_mig.llf)\n",
    "print(\"\\nAkaike Information Criterion (AIC):\\n\", model_mig.aic)\n",
    "print(\"\\nCorrected Akaike Information Criterion (AICc):\\n\", model_mig.aicc)\n",
    "print(\"\\nBayesian Information Criterion (BIC):\\n\", model_mig.bic)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### An ARIMA(1,2,1) example\n",
    "\n",
    "Let's now load the *population estimate* dataset. For this dataset a first difference is not enough to make the data stationary because of the quadratic trend, so we decide to go with `d`=2.\n",
    "\n",
    "This time we won't simply forecast but also predict in-sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_pop = load_dataset(\"population_estimate\")\n",
    "\n",
    "# Fit an ARIMA(1,2,1) model\n",
    "model_pop = ARIMA(df_pop, (1,2,1), fit_intercept=True)\n",
    "model_pop.fit()\n",
    "\n",
    "# Predict in-sample and forecast out-of-sample\n",
    "fc_pop = model_pop.predict(80, 160)\n",
    "visualize(df_pop, fc_pop, 80)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Seasonal ARIMA models\n",
    "\n",
    "[Seasonal ARIMA models](https://otexts.com/fpp2/seasonal-arima.html) are expressed in the form `ARIMA(p,d,q)(P,D,Q)s` and have additional seasonal components that we denote SAR and SMA.\n",
    "\n",
    "We can also choose to apply a first or second seasonal difference, or combine a non-seasonal and a seasonal difference (note: `p+P <= 2` is required).\n",
    "\n",
    "### An ARIMA(1,1,1)(1,1,1)12 example\n",
    "\n",
    "We load the *guest nights by region* dataset. This dataset shows a strong seasonal component with a period of 12 (annual cycle, monthly data), and also a non-seasonal trend. A good choice is to go with `d`=1, `D`=1 and `s`=12.\n",
    "\n",
    "We create the model with seasonal parameters, and forecast:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_guests = load_dataset(\"guest_nights_by_region\", 4)\n",
    "\n",
    "# Create and fit an ARIMA(1,1,1)(1,1,1)12 model:\n",
    "model_guests = ARIMA(df_guests, (1,1,1), (1,1,1,12), fit_intercept=False)\n",
    "model_guests.fit()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Forecast\n",
    "fc_guests = model_guests.forecast(40)\n",
    "\n",
    "# Visualize after the time step 200\n",
    "visualize(df_guests[200:], fc_guests)"
   ]
  }
 ],
 "metadata": {
  "file_extension": ".py",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "version": 3
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
